Learning in Graphs from the Netzschleuder Repository¶
Prerequisites¶
First, we need to set up our Python environment that has PyTorch, PyTorch Geometric and PathpyG installed. Depending on where you are executing this notebook, this might already be (partially) done. E.g. Google Colab has PyTorch installed by default so we only need to install the remaining dependencies. The DevContainer that is part of our GitHub Repository on the other hand already has all of the necessary dependencies installed.
In the following, we install the packages for usage in Google Colab using Jupyter magic commands. For other environments comment in or out the commands as necessary. For more details on how to install pathpyG especially if you want to install it with GPU-support, we refer to our documentation. Note that %%capture discards the full output of the cell to not clutter this tutorial with unnecessary installation details. If you want to print the output, you can comment %%capture out.
%%capture
# !pip install torch
!pip install torch_geometric
!pip install git+https://github.com/pathpy/pathpyG.git
Motivation and Learning Objectives¶
Access to a large number of graphs with different topological characteristics and from different domains is crucial for the development and evaluation of graph learning methods. Tousands of graph data sets are available scattered throughout the web, possibly using different data formats and with missing information on their actual origin. Addressing this issue the Netschleuder Online Repository by Tiago Peixoto provides a single repository of graphs in a single format, including descriptions, citations, and node-/edge- or graph-level meta-data. To facilitate the development of graph learning techniques, pathpyG provides a feature that allows to directly read networks from the netzschleuder repository via an API.
In this brief unit, we will learn how we can retrieve network records and graph data from the netzschleuder repository. We will further demonstrate how we can conveniently apply a Graph Neural Network to predict node-level categories contained in the meta-data.
We first need to import a few modules.
import numpy as np
from matplotlib import pyplot as plt
from sklearn import metrics
from sklearn.decomposition import TruncatedSVD
import torch
from torch.nn import Linear, ReLU, Sigmoid, Parameter
import torch_geometric
from torch_geometric.nn import Sequential, GCNConv, SimpleConv, MessagePassing
import pathpyG as pp
pp.config['torch']['device'] = 'cpu'
# pp.config['torch']['device'] = 'cuda'
Reading graphs from the netzschleuder repository¶
In the pathpy.io module, there is a function that allows to read graph data from the API.
We can read a given networks from the netzschleuder database using its record name. Just browse the Netschleuder Online Repository to find the record names. As an example, we use a graph capturing co-purchase relationships between political books.
g = pp.io.read_netzschleuder_graph('email_company')
print(g)
Mapping node attributes based on node indices in column `index` Temporal Graph with 167 nodes, 5784 unique edges and 82927 events in [1262454016.0, 1285884544.0] Node attributes node__pos <class 'numpy.ndarray'> Edge attributes time <class 'torch.Tensor'> -> torch.Size([82927]) edge_index <class 'torch_geometric.edge_index.EdgeIndex'> Graph attributes analyses_global_clustering <class 'float'> analyses_is_directed <class 'bool'> analyses_transition_gap <class 'float'> analyses_edge_properties <class 'list'> analyses_num_edges <class 'int'> analyses_average_degree <class 'float'> analyses_degree_std_dev <class 'float'> analyses_diameter <class 'int'> analyses_num_vertices <class 'int'> analyses_degree_assortativity <class 'float'> analyses_edge_reciprocity <class 'float'> analyses_knn_proj_1 <class 'float'> analyses_largest_component_fraction <class 'float'> analyses_vertex_properties <class 'list'> analyses_hashimoto_radius <class 'float'> analyses_is_bipartite <class 'bool'> analyses_mixing_time <class 'float'> num_nodes <class 'int'> analyses_knn_proj_2 <class 'float'>
We can plot this temporal graph in an interactive way:
pp.plot(g, edge_color='lightgray', edge_size=5);